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Abstract We introduce two new concepts designed for the study of empirical pro- 
cesses. First, we introduce a new Orlicz norm which we call the Bernstein-Orlicz 
norm. This new norm interpolates sub-Gaussian and sub-exponential tail behavior. 
In particular, we show how this norm can be used to simplify the derivation of 
deviation inequalities for suprema of collections of random variables. Secondly, we 
introduce chaining and generic chaining along a tree. These simplify the well-known 
concepts of chaining and generic chaining. The supremum of the empirical process 
is then studied as a special case. We show that chaining along a tree can be done 
using entropy with bracketing. Finally, we establish a deviation inequality for the 
empirical process for the unbounded case. 
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1 Introduction 



We introduce a new Orlicz norm which we name the Bernstein-Orlicz norm. It in- 
terpolates sub-Gaussian and sub-exponential tail behavior. With this new norm, we 
apply the usual techniques based on Orlicz norms. In particular, we derive deviation 
inequalities for suprema in a fairly simple and straightforward way. The Bernstein- 
Orlicz norm captures Bernstein's probability inequalities, and its use puts further 
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derivations in a unifying framework, shared for example by techniques for the sub- 
Gaussian case, such as those for empirical processes based on symmetrization and 
Hoeffding's inequality. 

We furthermore introduce chaining and generic chaining along a tree, which is 
we believe conceptually simpler than the usual chaining and generic chaining. We 
invoke it for the presentation of maximal inequalities for general random variables 
with finite Bernstein-Orlicz norm. The supremum of the empirical process is then 
studied as a special case, and we show that chaining along a tree can be done 
using entropy with bracketing. We establish a deviation inequality for the empirical 
process indexed by a class of functions G, in terms of the new Bernstein-Orlicz norm. 
The class G is assumed to satisfy a uniform Bernstein condition, but need not be 
uniformly bounded in supremum norm. 

The paper is organized as follows. In Section [2j we introduce the Bernstein-Orlicz 
norm and discuss the relation with Bernstein's inequality. We then present some 
bounds for maxima of finitely many random variables (Section [3]) or suprema over 
a countable set of random variables (Section 2]). Section 2] also contains the concept 
of (generic) chaining along a tree. The proofs of the results in Sections [21 [3] and S] 
are elementary and given immediately following their statement. Section [5] contains 
the application to the empirical process. The proofs here are more technical, and 
given separately in Sections [5] and [71 



2 The Bernstein-Orlicz norm 



Consider a random v ariable Z € M. with distribution P . We first recall the general 
Orlicz norm (see e.g. Krasnosel'skii and Rutickii 196lj ). 



Definition 1 Let <P : [0, oo) ^ [0, oo) be an increasing and convex function with 
If (0) 0. The if-Orlicz norm of Z is 



\Zy mi{ c> : Etf 



< 1 



A special case is the Lm(P)-norm (m > 1) which corresponds to !f'(z) = 2™. Other 
important special cases are ^{z) — exp[z^] — 1 for sub-Gaussian random variables 
and ^(z) = exp(2;) — 1 for sub-exponential random variables. We propose functions 
^ that combine sub-Gaussian intermediate tails and sub-exponential far tails. 



For each L > we define 

tf'L(z) := exp 



VI + 2Lz - 1 



1, z > 0. 



(1) 



It is easy to see that 'Fl is increasing and convex, and that tf'L(O) = 0. 



Definition 2 Let L > be given. The (L-)Bernstein-Orlicz norm is the If^-Orlicz 
norm with W = given in (|T]). 



Indeed, the Bernstein-Orlicz norm combines sub-Gaussian and sub-exponential be- 
havior: 

J exp[z^] — 1 for Lz small 



^l{z) 



\ exp[22:/L] — 1 for Lz large 
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Note that the constant L governs the range of the sub-Gaussian behavior. It is a 
dimensionless constant, i.e., it does not depend on the scale of measurement. 



The inverse of is 



L 



s'^i(t) = ybi(r+t) + -iog(i + t), t>o. 

With this and with Chebyshev's inequahty, one now directly derives a probability 
inequality for Z . 



Lemma 1 Let r := \\Z\\q,^. We have for all t > 0, 

Lt 



P \Z\ > 



< 2cxp[-t]. 



Proof of Lemma [Jl By Chebyshev's inequality, for all c > 



i>{\z\/c>Vt + §)=p{ \z\/c > ir-^c' i; 



Thus, 



= P hf^idZl/c) > e* - 1 < Eifi(|Z|/c) + 1 e-*. 



Lt 



P \Z\/T>Vi- 



limpf|Z|/c> Vi+ — 



< lim\EWLi\Z\/c) + 1 )e"* < 2e^ 



□ 



The next lemma says that a converse result holds as well, that is, from the proba- 
bility inequality of Lemma [1] one can derive a bound for the Bernstein-Orlicz norm, 
with constants L and r multiplied by 



Lemma 2 Suppose that for for some constants t and L, and for all i > 0, 



P \Z\>T 



r Lt 



< 2cxp[-t]. 



Then \\Zy^^ < VSt. 
Proof of Lemma [2l Wc have 

E^'y3L(l^l/(^/3r)) =^"p(^|Z| > N/3rS'^Jt))dt 



P( IZI > ^/3T 



VSL 



v/log(l + + ^log(l + t) 



P \Z\ > 



L 



y%g(r+t)3 + -log{l+tf 



dt 

dt<2l ^dt^l. 



□ 



We recall Bernstein's inequality, see Bennet 1962l | 



^ The constant can possibly be improved. 
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Theorem 1 Let Xi , . . . , Xn he independent random variables with values in R and 
with mean zero. Suppose that for some constants a and K , one has 



n I 



Then for all t > 0, 



Kt 



> G^t + — I < 2 exp[-i] 



The following corollary shows that || • jji^^j^ indeed captures the nature of Bernstein's 
inequality. 



Corollary 1 Let Xi, . . . , Xn be independent random variables satisfying the con- 
ditions of Theorem\^ Then by this theorem and Lemma\^ for L := ^/&K / {^/na) , 
we have 
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3 The Bernstein-Orlicz norm for the maximum of finitely many 
variables 



Using Orlicz norms, the argumen t for obtaining a bound fo r the expectation of 
maxima is standard. We refer to van der Vaart and Wellneij [1996] for a general 
approach. We consider the special case of the Bernstein-Orlicz norm. 



Lemma 3 Let r and L be constants, and let Zi, . . . , Zp be random variables satis- 
fying 

max WZ^y^ <T. 
i<j<p 



E max \ZA < t 



Then 

I 

i<i<p 

Proof of Lemma [3] . Let c > t. Then by Jensen's inequality 



^log{l+p) + ^\og{l+p) 



E maxjZ,! < c^^' f Etf'i f maxjZ, |/cj j = c^^' f E m ax^ tf'i ( | | A 



Therefore, 



E max |Z,| <limctf'^Mp maxEi^-i |Z,|/c <t^I\p) 

1<J<P ci.T \ l<J<p ' ' ' 



L 



Vlog(l+p) + -log(l+p) 
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As a special case, one may consider the random variables 

1 " 

-=^.9,(X,), J = 1, 



p, 



where Xi, . . . , Xn are independent random variables with values in some space X, 
and where gi, . . . ,gp are real- valued functions on A". If the gj{Xi) are centered for 
all i and j, and if one assumes the Bernstein condition 



1 " I 



i=l 



then one can apply Lemma [3l with t :— \/6a and giving the 

inequality 



E max 

i<]<p 



3K 



< cr\/61og(l +p) + log(l + p) 



(2) 



This follows from Corollary [TJ The cons tants can however be improved wh en using 
direct arguments (see e.g. Lemma 14.12 iBiihlmann and van de Geeil [201 ij ). 

We now present a deviation inequality in probability for the maximum of finitely 
many variables. 

Lemma 4 Let Let Zi, . . . ^ Zp be random variables satisfying for some L and r 



Then for all t > 



P I max \ZA > t 

i<j<p 



max WZ^y^ < T. 
i<j<p 



L 



v/log(l +P) + ^ log(l +P) + Vi+Y 



< 2exp[-i]. 



^/a + t, so that 



Proof of Lemma [4l We first use that for any a > and t > 0, one has ^/a + y/i > 

Vlog(l +P) + ^ log(l +P) + Vi+Y 



P I max \Zj \ > T 

i<j<p 



< P max \Z,\ > T 



,i<j<p 

Next, we apply the union bound and Lemma [T] 



v/t + log(l+p) + 2 + log(l + P)) 



P I max \Zj\ > T 
i<j<p 

<^p(|Z,|>r 



L. 



^t + log(l +p) + -it + log(l + p)) 



L. 



^t + log(l +p) + -{t + log(l + p)) 



< 2pexp 



-(t + log(l+p)) 



_2p_ 

l+p 



exp[— <] < 2exp[— i]. 



□ 



Using Lemma [21 this is easily converted into a the following deviation inequality for 
the Bcrnstein-Orlicz norm. We use the notation 



x+ :— x\{x > 0}. 
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Lemma 5 Let Let Zi, . . . , Zp be random variables satisfying for some L and r 



Then 



max I Z.j I — T 
i<j<p' 



Proof of Lemma [5j Let 



max WZ^y^ <T. 

i<3<P 



L 



v/log(l+p) + -log(l+p) 



< V3r. 



Z := max \Zi \ — r 



By Lemma m we have for alH > 



L . 



Vlog(l+p) + -log(l+p) 



P Z > r 



Lt 



= P max |Z,| > r 



L 



Lt 



Application of Lemma finishes the proof 



4 Chaining along a tree 



v/log(l+p) + - log(l + p) + Vt + y 



< 2exp[-t] 



□ . 



A common technique for bounding suprema of stochastic processes is cha ining as 
develo ped by Ko lmogorov, leading to versions of D udl ey's entropy bound (jPudlev 
\l96l\ ). See e.g. Ivan der Vaart and Well^ il996l | or Ivan de GeeJ |2000l | and the 
references therein. We however propose another method which we call chaining along 
a tree. This method is conceptually simpler than the usual chaining and, as far as 
we know, does not introduce unnecessary restrictions. An example will be detailed 
in Section El for the case of entropy with bracketing. The generic chaining technique 
of Talagraridl (2005j | is a refinement which we shall also consider in Definition [5] and 
Theorem [31 

Let S" e No be fixed. 

Definition 3 A finite trcc^T is a collection {Gsjf^o of disjoint subsets of {1, ... , iV} 
such that uf^pCs = {1, . . . , N}, together with a function 

parent : {1, . . . , iV} {1, . . . , N}, 

such that parent(j') G Gs-i for j £ Gs, s e l,...,^. We call an element of 
{1, . . . , N} a node, and Gg a generation, s = 0, . . . ,S. A branch of the tree with 
end node js S Gs is the sequence {jo, . . . , js} with js-i — parent(js), s = 1, . . . , S*. 

Definition 4 Let a collection of real- valued random variables W := {W^}^]^ be 
given. A finite labeled tree (T, W) is a finite tree with on each node j a label Wj . 

Let & be some countable set and let € R be a random variable defined for each 
9 G 0. We consider supremum of the process {\Zg\ : 6^0}. 



Actually, T is rather a forest consisting of | Go | trees 
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Definition 5 Let S > and t > be constants and let £ :— {Ls}f^Q be a sequence 
of positive numbers. A {S,t,C) finite tree chain for {Zg} is a finite labeled tree 
(T, W) such that for all s = 0, . . . , S, 

<r2-\ VjeGs, 

and such that one can apply chaining of {Zg} along the tree (T, W), with approxi- 
mation error S. That is, for each 9 G O there is an end node js € Gs such that the 
branch {jo, . . . ,js} satisfies 

s 

\Ze\<Y.\W,A+6. 



In the above definition, the approximation error S will generally depend on the 
depth S of the tree. We assume that at a fine enough level, the approximation error 
is small. The usual chaining technique does not assume a tree structure, but indeed 
often needs only a finite number of steps. A tree structure follows if the members 
at the finest level are taken as end nodes. With a finite number of steps, the sum 
given in ^ is finite. This avoids requiring convergence of an infinite sum. 

We have presented the definition of a finite tree chain for the Bernstein-Orlicz norm 
II • However, the concept is not particularly tied up with this norm, e.g., for 
sub-Gaussian cases one may choose to replace the Bernstein-Orlicz norm by the 
L2 (P) norm (corresponding to case where the constants in C all vanish) . 

Let us now turn to the results. 

Theorem 2 Let (T, W) be an {S,t,C) finite tree chain for {Zg}. Define 
s 



s=0 

It holds that 



Vlog(l + |G,|) + :^log(l + |G.|) 



(3) 



E[snp\Zg\] <j + 6. (4) 
Keee / 

Remark 1 One may minimize the right hand side of ^ over all finite trees. 
Proof of Theorem [2l We have 

s 

Esup|Ze| < VEmax|VKj| +5. 
Application of Lemma [3] gives that for each s G {0, . . . , 5*} 



EmaxlVFJ < t2-'' 



v/log(l + |G.|) + ^ log(l + |G,,|) 



□ 



With generic chaining, the condition on the Bernstein-Orlicz norm of the labels is 
dropped in the definition of the tree. This Bernstein-Orlicz norm then turns up in 
the constants ^ and ([6]) which appear in the generic chaining bound of Theorem [3] 
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Definition 6 Let 5 > be a constant. A 6 finite generic tree cliain for {Zg} is a 
finite labeled tree (T, W) such that one can apply generic chaining of {Zg} along 
the tree (T, W) with approximation error S. That is, for each 9 E there is an end 
node js S Gs such that the branch {jo, • • ■ , Js} satisfies 

s 

\Zg\<Y,\w,A + s. 

s=0 

Let (T, W) a finite labeled tree. For each end node k £ Gs, we let 

{Mk),...,jsik)} 

be the corresponding branch (so that js{k) — k), and we write 
Ws{k) := W^j,(fc), fceGs, s = 0,1,..., 5. 
Fix a sequence of positive constants C :~ {Ls}f^Q. We write for k S Gs, 

s 



s=0 
S 



72,*(fc) 51 l|W^s(fc)||.;',,Lslog(l + |G,|), 

12 Ak) 



s=0 



7*(fc) := 7i.*(^) + 



(5) 
(6) 



Moreover, we let 



and 



and 



7i.* := max7i *(fc), 72* := max72*(fc), 7* := inax7*(fc), 
fcsGs fceGs fceGs 



T* := max ||W"s(fc)||i,^ %/r 



L.,T^, := max V||VF^(fc)|l^^^(l + s)Ls 

k^G^ — 



s=0 



Theorem 3 iet (7~, W) he a 5 finite generic tree chain for {Zg}. Then 

pfsuplZel > 7* +(5 + T, 1 + ^ +n Vi+^ )<2exp[-t]. 
\ee0 I ^ J L ^ J / 



Remark 2 The result of Theorem [3] may again be optimized over all finite generic 
trees. 



Proof of Tiieorem [3l Define for s = 0, . . . , S', 



v/log(l + |G,|) + ^log(l + |G.|) 



v/(rT^)(TT7) + ii±£l^^ 



Using Lemma EJ we see that 



F 



> a J < 2exp[-(l + + s)], s^Q,...,S. 



(7) 
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We have 



, s s 



s=0 
S 



< 



^Pfma 



^ s=0 s=0 

^ / 

<^P(3fc: |W^,(/c)| > |lW^,(fc)|l^,^a, 



s=0 



\Ws{k)\ 

V"T"||W^,(fc)||^, 



s=0 



Now insert ([7]) to find 

s 



Pf max^ |iy,(fc)| > max^||H/,(fc)||^^^a, j < 2 ^ exp[-(l + 



< 



2e-(i+t) 2e-i 



l-e-( 



< ^—-^expht] < 2exp[-i]. 



We have by definition 
s 

IX 

it 



^log{l + \G,\) + ^\og{l + \Gs 



and 



Therefore, 



maxJ2\\Ws{k)h, 

s=0 

max V \\Ws {k)y^^ ^/{l + s) ^ t, , 

k ^ — ^ 

s=0 

S 

maxY] \\Wsik)\\^^ (1 + s)Ls = r*L*. 

k 

max ^ II W^, (fc) II a, < 7* + n yiTt + ^^^^-^^ 



s=0 



< 7* + r* H h r* 



^)] 



□ 



Note that the constants and possibly depend on the complexity of through 
the quantities {|| W^s(fc)||ii?j-^ : k e Gg, s = 0, . . . , S}. Moreover, the choice of the 
constants C = {Lsjf^g may also depend on the complexity of 0. In the application 
to the empirical process (see Section [S|), the latter will be indeed the case. We will 
nevertheless derive there a deviation inequality where we put the dependency on 
the complexity of in the shift. 

As a simple corollary of Theorem |31 one obtains a deviation inequality in the 
Bernstein-Orlicz norm. We state this for completeness. In Section [5] we will not 
apply Corollary [3] directly, because as such, it does not allow us to put all depen- 
dency on the complexity of in the shift. 
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Corollary 2 Let the conditions of Theorem\^be met. Then the combination of this 
theorem with Lemma\^ gives 



sup - (7* + 5 + n [1 + L,/2]) 
see 



< %/3t 



By Jensen's inequality, we then get 



Esup|Ze| < 7* +(5 + r. 



Example 1 In iTalagrandl |2005l |. the sizes |Gs| of generation s is fixed to be 



In that case, 

Hence 

where 

and for fc e G 



|G,| =2^ , s = 0,...,S. 
log(l + \Gs\) < [2^" + l)log2 < 22"'+! < 22(''+i). 
7* < 270, 



70 := max 70 (fc), 

keGs 



s, 



7o(fc) 7i,o(fc) + 



72,o(fc) 



and 



71 



oik) ^ ||Iy,(fc)||^,^2^ 72,o(A) ^ ||Ty.(fc)|k,^L,22^ 



s=0 



s=0 



Furthermore, since 1 + s < 2^^* for all s > 0, 



and 



Hence, 



and 



T* < 71.0 max 7i,o(fc), 

keGs 



T^L^ < 72,0 := max72,o(fc)- 



7* + n 



1 + 



< 3 



71,0 



72,0 



Vlog2+-^/^log2 



< V31og2 71,0 



3 log 2 



-72,0- 



It follows from Corollary [2] that 



Esup|Ze| < (3+ log 2)71,0 + ^ 72,0- 

Thus, we arrive at a special case of Theorem 1.2.7 in ITalagrandl |2005| . The latter 
book does not treat deviation inequalities. 

When using a {6,t,C) finite tree chain, one takes || W^s(A:)||i;^^^ < r2~* for all s and 
k (z Gs- In that case, the constants and in the bounds given in Corollary |3| 
only depend on the scale parameter r and on the constants C = {is}f=o- This is 
detailed in the next theorem. 
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Theorem 4 Let the conditions of Theorem\^he met, and define 
s 



s=0 



Vlog(l + |G,|) + :|log(l + |G.|) 



s=0 



Then for all t > 



pfsuplZel >7 + (5 + 4t 

\eee 



L 

1+2 



-At 



r Lt 



< 2exp[-t]. 



Proof of Theorem |4l This follows from Theorem [31 where one takes 
We have 

r./r<^2-y(rT^=2^2-yi<2 / 2-V5^dx = < 4. 



Moreover, 



s=0 



□ 



5 Application to empirical processes 

Let X be some measurable space, and consider independent A'-valued random vari- 
ables Xi, . . . , Xn- Let be a collection of real- valued functions on X. 



Write 



^ n 1 

Pug := -E-9(^0, ^.9 := -5]E5(X,), 



and 



We assume the normalization 



supllffll < 1. 
see 



We study the supremum of the empirical process {vnig) '■ 9 G G}, where i^n{g) 
^{Pn - P).g. 



We recall the de viation inequality of iMassartI [2000j, which refines the constants in 
Talagrandl Il996i. 
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Theorem 5 I 'Massari \200(\] ) Suppose that for a constant K 



sup sup \g{x)\ < K. 
Then for all e > and all t > 0, it holds that 



(8) 



P sup|z^„(.g)| > (l + e)Esup|j/„(.g)| + \/2^+K(e)i\:t/V^ < exp[-t], (9) 
\geg geg ) 

where k and K(e) can he taken equal to k — A and K(e) = 2.5 + 32/e. 



For the i.i.d. case. lBousauetl |2002l | obtain ed consta nts remark ably close t hose to for 
the case where ^ is a singleton. In fact, MassartI [2000] and Bousauet :2002] and 
others have derived concentration inequalities which in addition to upper bounds 
show similar lower bounds for the supremum of the empirical process. This is com- 
plemented in lLederer and van de Geer to moment concentration inequalities 
assuming only moment conditions on the envelope r{-) :— sup^^g IfflOL instead of 
the boundedness assumption (jS]). 

In this paper, we provide a deviation inequality of the same spirit as in the above 
Theorem [5l where we replace condition ([8]) by a weaker Bernstein condition (see 
PTjl ). which essentially requires that the g{Xi) have sub-exponential tails, and 
where we also present a deviation result in Bernstein-Orlicz norm. These devia- 
tion results in probability and in Bernstein-Orlicz norm are given in Theorem [5] 
We have not tried to optimize the constants. Moreover, we replace the expectation 
Esupggg |i^n(ff)| in ([9]) by the upper bound we obtain from chaining argument^. 
De viation inequalities for the sub-exponential case can be found in literature (see 
e.g. Viens and Vizcarral 2007 1). but these do not cover the more refined interpola- 
tion of sub-Gaussian and sub-exponential tail behavior. The above cited work also 
contains lower bounds for suprema, thus completing the results to concentration 
inequalities. 

Now our first aim is to show that entropy with bracketing conditions allow one to 
construct a finite tree chain. We recall h ere the definition of a brac keting set and 
entropy with brack eting (see iBluiiil (l955| . or see van der Vaart and Wellner. [1996J . 
van de Geer 2000l | and their references). 



Definition 7 Let s > be arbitrary. A 2^''-bracketing set for {Q, || • ||} is a finite 
collection of functions {[g^ .g^W^^i satisfying jj^j^ — g^\\ < 2^* for all j, and such 
that for each g ^ G there is a j e {1, . . . , iV^} such that 5^ < g < gf ■ If no such 
finite collection exists, we write Ns = 00. 



We also introduce a generalized bracketing set, in the spirit of Ivan de Geed [2000j 



Definition 8 Let If > be a fixed constant. A generalized bracketing set for Q is 
a finite collection of functions {[gj' ,gf]}f:^i satisfying for all j 



< — (21^)"-^ m = 2,3,.. 



and such that for each g £ G there is a j e {1, . . . , Nq} such that gj < g < gf ■ 
Write Nq — 00 a no such finite collection exists. 



^ This upper bound can be shown to be (up to constants) tight in certain examples. The upper 
bound following from generic chaining is modulo constants tight for the general sub-Gaussian case. 
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A special case is where the envelope function F sup^gg \g\ satisfies the Bernstein 
condition 

pr™ < — (2if)™-^ m = 2,3,.... 

Then one can take [—r,r] as generalized bracketing set, consisting of only one 
element. 

In what follows, we let for each s G N, Ng be the cardinality of a minimal 2~^- 
bracketing set for Q. The 2~'*-entropy with bracketing of G is 

:=log(l + 7V,), seN. 

Moreover, A'o is the cardinality of a minimal generalized bracketing set, and we let 

Ho :=log(l+iVo). 

Finally, we write 

s 

Ns := n ^k, := log(l + iV,), s E No. (10) 

The following the orem uses argum ents of lOssiander (1987| . an d is comparable to 
Theorem 2.7.11 in Talagrand 2005| (who adapts the technique of Ossiander 1987 1). 
However, we do not use generic chaining here. On the other hand, our results lead 
to the more involved deviation inequalities as given in Theorem |8l 

Theorem 6 Suppose that for some constant K > 1, one has the Bernstein condi- 
tion 

IT) ^ 

supP|3|"<— ™ = 2,3,.... (11) 
geg ^ 

Let S be some integer, t := 3^/6 and 6 := 'iy/n^^^i /Ks-i + \/n2^^ , where 
{iiTs-ilf^]^ is an arbitrary deceasing sequence of positive constants (called truncation 
levels). Suppose that Ns < oo for all s — 0, . . . , S*. Then there is a {5,t,C) finite 
tree chain for {vn{g)} , with \Gs\ < Ns, s = 0, . . . , S , and with 

Lq — , l^s — o f- ' S— t,...,D. 

As a consequence, we can derive a bound for the expectation of the supremum of 
the empirical process. 

Theorem 7 Assume the Bernstein condition ill]) . Let 

In 



Es := 2-^^^ + 14 ^ 2-" y 6i?^ + ^'^K 



s=0 

Then one has 



E sup|i/„(c/)| < minEs. 

Remark 3 When is finite, say 16*1 — p, one may choose a bound with S ~ S — Q, 
and Hq < log(l +p). Theorem then [7] yields - up to constants - the same bound as 
in ©. 



14 



Finally, we present the main result of this section. We give deviation results in 
probability and in Bernstein- Orlicz norm, where the dependency on the complexity 
of Q is only in the shift. 



Theorem 8 Assume the Bernstein condition ill]) . Define as in Theorem^ 

S 



E 



s := + 14 V 2-^J6H, + 



Let 



Then for all t > 0, 



L 



2^' 



P sup I l^n ig)\ > minEs + G^isT/Vn + 24\/6 



gee 



Vt- 



Lt 



< 2exp[-i]. 



Moreover, 



( 


sup|i/„(5)| 






.see 





min Es + e^X/Vn + 24\/6 



< 72V2. 



Theorem[8]can be compared to results in Adamczak 2008l |. One sees that our bound 
replaces the sub-exponential Orlicz-norm 



inax sup\g{X^)\ 

l<i<n g^g 



^{z) = exp(z) - 1,^; > 0, 



occurring in lAdamczakl j2008l | by a constant proportional to which mea ns we 
generally gain a log n-term. On the other hand, the shift in Adamczak 2008| is up 
to a factor (1 -|- e) equal to the expectation 

Esup Wn{9)\, 
see 



as m 



Massart 2000| ') (whose result is cited here in Theorem [S]). 



Remark 4 Again, when \0\ = p is finite, one can choose S = 6 = 0, and Hq < 
log(l +p). as in Remark |31 Theorem [S] then reduces to the usual union bound type 
deviation inequalities for the maximum of finitely many random variables (that is, 
the results are - up to constants - a special case of Lemmas S] and [5]) . 



6 Proofs for Section [5] 



6.1 Proof of Theorem ini 

T his follows from s imilar arguments as in Ivan de Geeil |2nnn| . who uses in turn ideas 
of lOssiandeii |1987| . Let for s = 1, . . . , S*, 



be a minimal 2~*-bracketing set for || • ||. Let {[g^f^ ,gj'^Y\'jfLi be a generalized 
bracketing set. 
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Consider some g & G, and let be the corresponding generalized bracket, 

and for all s e {1, . . . , 6"}, let the corresponding brackets be [g^'^, g^'^]- Thus 

-g^'L < g < g-^'U ^ S = 0, . . . , 5, 

and 

777 ^ r» 

P\9o,u - go,Lr < -ji^Kr , m = 2, 3, . . . , 

pr'^-5s.Lp<2-2^ 5 = 1^... 

If for some s there are several brackets in {[^j'^, corresponding to g, we 
choose a fixed but otherwise arbitrary one. Define 

s,L ~k,L s,U ■ ~k,U 

q ' := max q ' , o ' := mm q ' . 



Then 

.9' '"^ S 5''"^ S • • • S g"'" S 5 S .9"'" S • • • S .9''" S .9" 

and moreover g**'^ — 5*'^ < g**'^ — Denote the difference between upper and 
lower bracket by 

■■= g''"" - 9''\ s = 0,...,5. 

The differences are decreasing in s. Furthermore, < 2"*, for all s € 

{0,1,. ..,5}. 

Let Afs := s = 0, . . . , S. It is easy to see that 



K<'[[Nk=--N„ s = 0,...,S. 



fe=0 



We define a tree with end nodes {!,... , A/g}. At each end node j sits a pair of 
brackets [gj ' ,gj' ]. For each s = 0, . . . , 5 — 1, we define the parents at generation 
s as follows. Let 

Vk' ■■= {I ■■ forms a 2-(^-i)-bracket for 



Then Uj,^^^Vjf = {1, . . . , A/'s}, that is, for each bracket [g^'^^jg^^] there is a fc G 
{!,... ,Ns-i} with I e Vfc. To see this, we note that for each I, there is a function 
g with g^'^ < g < 9i'^ ■> and by the above construction, there is a with gl~^'^ < 
gt'^ <g< gl'^ < gl'^'^- We let {V^}^!^' be a disjoint version of {V^}, e.g., the 
one given by 

V,' = v{, v,^ = v,^\ uf-l fc = 1, . . . , AA,_i. 

We let 

parent (is) = A; if js G V^. 

We now turn to an adaptive truncation device. For for each s = 0, . . . , 5* — 1, we are 
given truncation levels Kg, such that Kg is assumed to be decreasing in s. Let g be 
fixed and 

gO,L < gl,L < . . . < gS,L ^ ^ ^ ^ S , [7 ^^^^^ ^ l , [7 < ^0,C/ 

Define 

Zi^ :=5^'^-3^'^, :=1{A'>K,}. 

Then 

= 1} < An{y, = 1}, s = 0, . . . , S - 1, 
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which imphes (for s = 0,...,5— 1) 



We can write any g & Q as 
s 

9 = T.(9- 9°'n^{ys = ^,ys-i = . . . = yo = 0} (12) 

s=l 

s 

+ - a'^'^'^nys-i = . . . = yo = 0} + go,L + ig- ff°'^)l{yo = 1} 

s=l 

W^,o :=k„(5"'^)| + |^„(/\'')|, 
W^,^ := |j.„(Z\^l{2/,_i = 0})| + |^^„((/'^ - g^-i'^)l{y.-i = 0})|, . = 1, . . . , S. 
Then it follows from ([H]) that 

s s s 

Wni9)\ < E 1^^-= I + V^E ^^'Hy. = 1} < E I + '5, 



Let 



for 



Note now that 



<5^v^E 



4 2 



-2s 



Ks-1 



SO 



By Corollary n 
for 



||i^n(ff°'^)IU.„ <2V6, 
io = V6{8K/2)/y^. = aVg/V^, 



where we multiplied by a factor 2 because the Bernstein condition for the centered 
functions holds with the above AK replaced by 8K. Moreover, Lq = ^/6{AK) / ^/n, 
so again by Corollary [U 



The triangle inequality gives 

kn(5"''')l + l^«(/i°)| 



< 3\/6 =: T. 



fin 



Moreover, for s — I, . . . , S, 
9 

and 



Z\n{j/3_i = 0} < A'-^ < Ks-i, \\A'\\ < T 
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So, again by Corollary [1] we may take 

:= V6 2' max(^i^,_i/2, = s = 1, . . . , 5. 



Then, again by the triangle inequality, 

kn((5^'^ - 5^-i'^)l{2/.-i = 0})| + \v„{An{y,^^ = 0})| 



< 3^6 2^ 



□ 



6.2 Three technical lemmas 

To apply the result of Theorem [6j we need three technical lemmas. First we need a 
bound for Ns := nfc=o -^s, or actually for Hg := log(l + Ng). 

Lemma 6 Let s e {Q, . . . , S}, := log(l + HLo ^k) and log(l + Ns)- It 

holds that 

S s 

s=l s=l 

Proof of Lemma [6j We have 

s I 



so 



s 



s=l s=l s=l k=l 

I ^ ^ I I ^ I 

k—l s—k k=\ 



The next lemma inserts a special choice for the truncation levels {i^s}, and then 
establishes a bound for the expectation of the supremum of the empirical process, 
derived from the one of Theorem [2] 

Lemma 7 Let Let S he some integer and e> be an arbitrary constant. Take 

Ks^, := 2-^^^f — =^== A s = 1, . . . , 5, 
V3v/log(l + 7Vs) £/ 

where uAv denotes the minimum of u and v. Define as in Theorem\^ 
Wok _ 2^6 2'Ks^i _^ „ 



J 4VHE 2"^Vii:s-i + 
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and T :~ 3\/6- Let 



Then 



where 



Vlog(l + TV,) + log(l + Ns) 



s=0 V 



Proof of Lemma [3 We have 

S A r,-2s 

Ks-1 



Es = y ^ + 2-^VH + T Vlog 1 + No) + 2V6 ^ °^ 



V log(l + A^..) + ly6 2-AVi ^"g^^ + ^-^ 
3 Vn 



log(l + TVo) 



S 



-3y2-V61og(l + iV.)+y6/^.,i ^°g^^ + ^-) =/ + 

.log(l + iVo 



where 



// + ///, 



/ := 2-^^/H + 3A/61og(l + 7Vo) + e^JsT- 



// :=3^2-V61og(l + iV,), 



and 



Insert 



^ 4 2-2s0i 



log(l + Ns) 



= 1 ^ s=l 



3^^ Vlog(l + Af.) 



" A2-^, ,s = l,...,5. 



Note that Kg is decreasing in s. Moreover 

''"^^ ' 6i^._, ^°g(^ + < 4V6 2- V log(l + A^.) + 4 



Ks-i 



We find 



so that 



/// < 4\/6 y 2-" Vlog(l + Ns) + 4e, 

S 

// + /// < 7%/6y 2-Vlog(l + Ns) + 4e. 
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Now apply Lemma [51 This gives 



Hence, 



// + /// < rVey log(l + No) + 14^6 ^ 2-yiog(l + N,) + 4e 



I + 11 + III < 2-'^ + 62j^ log(l + iVo) ^ lo^/6v'log(l + iVo) 



14^6^ 2-yiog(l + Ns) + 4e 



s=l 

S 



□ 



We now derive some bounds which will be used for obtaining the deviation inequal- 
ities in probability and in Bernstein-Orlicz norm of Theorem |8l 

Lemma 8 Let the constants {Ks-i\f^i, {Ls\f^Q, and r he as in Lemma\^ Let 

4 

Then 

L < V6ii:/V^+2A 

and 

4r(l + L/2) < Q^K/y/^ + 24^6. 

Proof of Lemma [8] . We have 

r _ ^0 , 4^2-'Ls{l + s) 

s=l 



/n ^ V6n 

s— 1 ^ 



But 

V 2-^(1 + s)<2/ 2-''xdx^ — 
and since Hs = log(l + Ns) > log(2), 



g2) 



K,_,<2-^M—4l—A- 



3(log(2))i/2 



Hence, 

V6(log2)2U(log(2))i/2^7 
_ VQK 2 2 V6 

~ ^ 3(log2)(5/2) ^ 6(log2)2T" 

%/6if „ \/6 
Vn e 

As r = 3^6, we get 

4r(l + L/2) < e^K/y/fi + 24y/6. 



□ 
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7 Proof of Theorems [7] and [8] 



Proof of Theorem \7\ This follows from Theorem [21 Theorem [51 and Lemma [7] 
with e = 0. □ 

Proof of Theorem \8\ Let t > he arbitrary. Note that Eg is as in Lemma [7] 
Apply the bounds of Lemma[5]with e = 3\/t for the constant L defined there. Then 



r(4 + 2L)+4e + 4T 



■ r Li 



<6^K/V^+24V6 + Ae + uV&i+2T^^^ + 2T^ 

Jn e 



6^K/^/^ + 6^Kt/^/^ + 24V6 + 12V&t + 2AVt 
< 6^K/y/7i + 24V6 + 24^/6 



r- Lt 



where 



L 



2^' 



Then by Theorem [H 



P(sup|i^„(g)| > min Es + 62X/v^+24V6 + 24^6 



r Lt 



and by Lemma [2] 



( 


sup|j^„(5)| 






.see 





min 



< 2cxp[-t]. 



< 72\/2. 
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